Local Semantic Indexing Based on Partial Least Squares for Text Classification

نویسندگان

  • Xue-Qiang Zeng
  • Guo-Zheng Li
  • Ming-Wen Wang
  • Geng-Feng Wu
چکیده

Semantic Indexing based on Partial Least Squares (SIPLS) is an effective feature extraction method for text classification. SIPLS integrates the global category information Y with the document-class matrix X to create the latent semantic spaces. However, the global latent space may not be the optimal one for each class. To solve this problem, the Local SIPLS (LSIPLS) method is proposed which creates one SIPLS space for each class. Without the influence of global information, the local discriminative components are convenient to be extracted in LSIPLS. Compared with global SIPLS, LSIPLS obtains similar performance with rather compact dimensionality. Empirical results on Reuter corpus prove that LSIPLS is a powerful tool for text classification.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cross - lingual Information Retrieval Model based on Bilingual Topic Correlation ⋆

How to construct relationship between bilingual texts is important to effectively processing multi-lingual text data and cross language barriers. Cross-lingual latent semantic indexing (CL-LSI) corpus-based doesnot fully take into account bilingual semantic relationship. The paper proposes a new model building semantic relationship of bilingual parallel document via partial least squares (PLS)....

متن کامل

Enhancing User Search Experience in Digital Libraries with Rotated Latent Semantic Indexing

This study investigates a semi-automatic method for creation of topical labels representing the topical concepts in information objects. The method is called rotated latent semantic indexing (rLSI). rLSI has found application in text mining but has not been used for topical labels generation in digital libraries (DLs). The present study proposes a theoretical model and an evaluation framework w...

متن کامل

A Joint Semantic Vector Representation Model for Text Clustering and Classification

Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...

متن کامل

Role of semantic indexing for text classification

The Vector Space Model (VSM) of text representation suffers a number of limitations for text classification. Firstly, the VSM is based on the Bag-Of-Words (BOW) assumption where terms from the indexing vocabulary are treated independently of one another. However, the expressiveness of natural language means that lexically different terms often have related or even identical meanings. Thus, fail...

متن کامل

LRLW-LSI: An Improved Latent Semantic Indexing (LSI) Text Classifier

The task of Text Classification (TC) is to automatically assign natural language texts with thematic categories from a predefined category set. And Latent Semantic Indexing (LSI) is a well known technique in Information Retrieval, especially in dealing with polysemy (one word can have different meanings) and synonymy (different words are used to describe the same concept), but it is not an opti...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008